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Measures of centre (the mean, median and mode) are fundamental to the discipline of 
statistics. Yet previous research shows that students may not have a thorough conceptual 
understanding of these measures, even though these statistics are easy to calculate. This 
study describes the findings of a study of pre-service teachers’ ideas of measures of centre. 

The results indicate that while some participants had ideas about these statistics that were 
valid; a substantial proportion displayed little understanding of these measures. 

When people use and make sense of statistical information, they can be said to be using 
statistical reasoning (Garfield & Chance, 2000). Garfield (2003) proposes that statistical 
reasoning has many facets, one of which involves “Understanding what measures of 
centre, spread, and position tell about a data set, knowing which are best to use under 
different conditions, and how they do or do not represent a data set” (p. 25). This paper 
examines first-year pre-service teachers’ understanding of these measures of centre. 

The most commonly used measures of centre are the mean, median and mode. The 
mean is what is commonly known as the average , the result of dividing the sum of the 
individual values in the data by the number of values. The median is the point at which half 
of the data is above and half below it. The mode is the most common value. Each of these 
measures is somehow representative of the data as a whole, and, depending on the nature 
of the data, one of these measures may be more representative of the data than another. For 
example, if the data should have one or more outliers, that is, values that are much higher 
or lower than the rest of the data, the mean may be higher or lower than the majority of the 
data, and the median will be more representative. By the nature of its calculation, the mean 
is also the balance point of the data, in that the sum of the deviations above the mean equal 
the sum of the deviations below the mean. The mean is also what would result if all the 
data values were spread evenly. It is this latter characteristic that leads to statements about 
the ‘average’ family having 3.2 c hildren, a source of confusion to many (Mokros & 
Russell, 1995). Yet another interpretation of these measures of centre is that they are a way 
of finding the “signal” in a “noisy process” (Konold & Pollatsek, 2002, p. 260). 

Mokros and Russell found that younger children (in Grade 3) had not yet developed the 
idea of treating several data points as a single entity, and therefore had no concept of a 
number that could represent a data set. It was also found that students who were beginning 
to develop the idea of a representative number first used the modal value. This view of 
measures of centre was also found by Watson and Moritz (2000) and Jones et al. (2000). 
As the students became older, however, students start to develop more sophisticated views 
of how a mean, in particular, can be representative of a data set (Watson & Moritz, 2000). 

Mokros and Russell describe three ways how older students in their study understood 
the mean: the mean as the midpoint, the mean as a reasonable value and the mean as the 
point of balance. While students with these views show a deeper understanding of the 
mean, they each have deficiencies. For example, if a distribution is skewed, then the mean 
will not coincide with the midpoint. According to Mokros and Russell, students who have 
the reasonable view of the mean still lack a “precise definition of representativeness” (p. 
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37). The students they observed who saw the mean as a balance point were still struggling 
to make the connection between the idea of representativeness with the “mathematical 
relationship” (p. 37). 

A common finding in much of the previous research on understanding of the mean is 
that some students define the mean by its algorithm (for examples see Mokros & Russell, 
1995; Groth & Bergner, 2006). For these students, the mean is the algorithm and because 
they do not see the mean (or any other measure of centre) as a representative number they 
do not use it when it would be useful, such as for comparing data sets (For examples see 
Konold & Pollatsek, 2002; Reabum, 2012). These students appear to be familiar with the 
algorithm, but then have no further understanding for why the mean is calculated at all. 

The aim of this study was to determine pre-service teachers’ understanding of 
measures of centre, in particular, to discover their understanding of the relative merits of 
the mean and median in data with an outlier. 

Method 


Participants 

The participants were volunteers from first and second year mathematics education 
units in a pre-service teaching course at an Australian University. There were 32 
participants, 12 of whom were females and 20 of whom were males. Twenty six of these 
students stated that they had completed a Year 12 mathematics unit. 

The Tasks 

The participants were administered a q uestionnaire that asked them a series of 
questions about their understanding averages in different contexts. The first question asked 
them to describe their reaction to the word ‘average’ and this was followed by a question 
that asked them to describe their understanding of the word average in a particular context. 
This was followed by a question that asked the students to calculate the mean, median and 
mode of a small number of scores. The details can be found in Figure 1. 


1 . What do y on think o f when y on he ar the w r ord £ a v er age a ? 

2. The average family has 2. 2 children, what do you think this meant? 

3 . The following are the scores. Jane received on her mathematics tests in term 1 . 
All the tests were marked out of 10. 

4, 5, 7, 7, 1 , 8. 9 

a. Calculate the mean ofthese scores. 

b . C alculat e the me dian o f the se sc ore s. 

c . C alculat e the mo de o f the se sc ore s. 


Figure 1. Questions 1 to 3 in the questionnaire. 
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The final question asked the students to consider a plot of data that consisted of the salt 
levels of different foods where one food, soy sauce, was an outlier. They were given the 
mean and median, and asked a series of questions that asked them to explain the difference 
in the values of the mean and median, which of these two measures of centre was the ‘best’ 
for these data and what would happen to the values of the mean and median if the soy 
sauce was removed from the data set. The details can be found in Figure 2. 

4) The following graph shows the sodium levels of several different foods in mg of salt per 
lOOg of food. The mean (A) salt content is 1400 mg/lOOg, and the median salt content (j.) 
is 901 mg/lOOg. 


Salt Options T 



Plot produced in TinkerPlots (Konold & Miller, 2005) 

a) Why do you think there is a difference between the values of the mean and the 
median? Please give your reasons. 

b) Which of the numbers, the mean or median, do you think best represents the data? 
Please give your reasons. 

c) What do you think would happen to the value of the mean and median if soy sauce 
was deleted from the data set? Please give your reasons. 

(This question was based on an idea from Watson & Beswick, 2009) 


Figure 2. Question 4 of the questionnaire. 


Results 


Question 1 

The answers are summarised in Table 1. This table shows that approximately 38 per 
cent of the students saw the mean as “the middle”. The next most common answer was to 
repeat the algorithm, or to just say that an average is the mean, median or mode. One 
student combined these two answers: “The middle score. The sum of all numbers added up 
then divided by how many there were.” 
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Table 1 

Answers given by the participants to Question 1 * 


Answer 

Number (n = 32) 

Amount per attempt 

2 

Middle number 

12 

Normal 

3 

Mean or median or mode 

3 

The algorithm 

11 

Most common number 

3 

Expected 

1 

Idiosyncratic or no answer 

6 

Mediocre 

1 

*There are more than 32 answers; some students gave more than one answer. 

Question 2 

This question was worded to prompt the students to give a more in depth explanation 

of the average and the answers are summarised in 

Table 2. It was apparent from the 

answers that the students were thinking of the mean as 

they answered this question. 

Table 2 

Answers given by the participants to Question 2 * 

Answer 

Number (n = 32) 

If spread out evenly, there will be 2.2 

4 

There is some variation around 2.2 

7 

Most are close to 2 

6 

The mean or average is 2.2 or described algorithm 

12 

Normal is 2 

2 

Most common is 2 

3 

It is likely 

1 

No response or idiosyncratic 

5 


*There are more than 32 answers; some students gave more than one answer. 


Twelve of students merely repeated the problem, for example “The average is 2.2.” 
Four of the students gave an answer that reflects the equal sharing concept of the mean 
(“spread out evenly”) and 13 stated that the mean is “close to” or there is “some variation 
around” two. 

Question 3 

Table 3 shows that only approximately two thirds of the students calculated the mean, 
median and mode of the data correctly. The most co mm on error was to add up the scores 
but complete the algorithm with the division. 
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Table 3 

Answers given by the students to Question 3 


Number (n = 32) 

Mean correct 

22 

Median correct 

23 

Mode correct 

23 


Question 4 

The answers to Question 4a are summarised in Table 4. This table shows that 11 
(approximately one third) of the students correctly noted that the difference between the 
mean and median was due to the presence of the outlier. Eight of the students defined the 
mean and median, but did not explain how these calculations led to a different result for 
the two measures of centre. When it came to which is the “best” statistic in this situation 
(Question 4b) it was apparent that out of those who answered the students were split 
between those who chose the median, and those who chose the mean (Table 5). Four 
students stated that the mean is best because it “takes all the values into account,” three 
said it was “more correct” and two because “it is the average.” 

Table 4 

Answers given by the students to Question 4a 

Answer 

Number (n = 32) 

Because of the presence of the outlier 

11 

Defined mean and median 

8 

idiosyncratic 

2 

No answer 

11 

Table 5 


Answers given by the students to Question 4b 


Answer 

Number (n = 32) 

Median is best 


Median is more representative 

3 

Median because of soy sauce 

3 

Median - no further explanation 

1 

Mean is best 


Takes all values into account 

4 

More correct 

3 

Because it is the average 

2 

No answer 

16 
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Table 6 shows that 13 of the students realised that with the removal of the soy sauce 
from the data (Question 4c) that the mean and median would either get closer together or 
that the mean would decrease. 

Table 6 

Answers given by the students to Question 4c 


Answer 

Number (n = 32) 

Mean and median will get closer together 

7 

Mean will decrease 

6 

Median will remain the same/change a little 

4 

Median decrease 

1 

Both decrease 

5 

Nothing 

1 

Idiosyncratic 

3 

No answer 

11 


Discussion 

It is apparent from these data that some of the participants had ideas of an average that 
suggests that an average is “normal”, “expected”, “most common” or in “the middle”. This 
is reflected in the answers to Question 1 and Question 2. Question lwas worded to elicit 
participants’ immediate response to the word average, while Question 2 was worded to 
elicit more in depth responses. The responses to Question 2 s howed that some of the 
participants had an understanding that the values in a data set vary around the average in 
some way; others described this as the “number per attempt”. It is of note that the answers 
to question 2 showed that the students were thinking of the mean, and not the median or 
mode. The answers to these questions also show that whereas some of the participants had 
a view of the measures of centre that indicate “reasonableness” or “the middle”, none gave 
answers that reflected a “balance point” view (Mokros & Russell, 1995, p. 37). Some 
participants, however, did have a view that is related to the idea of “signal from the noise” 
(Konold & Pollatsek, 2002, p. 260) in that they could indicate that the data varied around 
the mean. 

Question 3 required knowledge of the algorithms to calculate the mean, median and 
mode. Only two thirds of the students correctly calculated these statistics for the given 
data, even though it was a simple data set and the values had already been placed in order 
of magnitude. The most common error was to add up the scores and then not divide by the 
number of scores. This error implies that these participants did not understand that the 
answers should be in some way representative of the data, or should at least be within the 
range of the data. 

Question 4 asked the students why there was a difference between the values of the 
mean and median. Eleven students identified the presence of the outlier for the difference. 
When it came to a choice of which statistic was “best” some participants chose the mean 
for the very reason that it is taught that the mean should not have been chosen in this 
situation, in that it included all the data in the calculations. Yet others somehow felt the 
mean was in some way more “correct.” 

One of the most striking features of the data was the number of participants who 
answered these questions by either restating the problem (“The average is 2.2”) or by 
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describing the algorithms to calculate these measures of centre. These answers gave no 
indication that the participants had any idea that an average should be in some way 
representative of the data, and this is a concerning finding. It is also of note that there were 
a large number of non-responses, especially as the questionnaire progressed. 

If “Understanding what measures of centre, spread, and position tell about a data set, 
knowing which are best to use under different conditions, and how they do or do not 
represent a data set” (Garfield, 2005, p. 25) is part of statistical reasoning, then it would 
appear that many of the participants in this study will not be successful in carrying out 
statistical reasoning. According to Konold and Pollatsek (2002) students who have a view 
of an average that is “typical” (p. 261), whilst they are showing some understanding of 
these measures, still have “little conceptual basis for using such statistical indices to 
characterize a set of data” (p. 261). 

Implications for teaching 

It is now over 20 years since Mokros and Russell (1995) published their paper on the 
understanding of averages. In this paper we read, “We believe that children wedded to the 
algorithm must be pulled away from their narrow view of average as a procedure to focus 
on describing and comparing data sets” (p. 37). Yet it appears that the idea of the mean 
being the algorithm is still firmly entrenched in the minds of at least some, if not many, of 
students at university. 

This is of particular concern when the students are pre-service teachers. If future 
teachers do not have the experience and knowledge of statistical concepts required, then it 
is unlikely that they will be able to provide their students with the learning experiences to 
learn these concepts themselves (Jacobbe, 2008). This study raises the question why 
present university students can have such little conceptual understanding of the most 
fundamental concepts in the discipline of statistics. Research into how students are being 
introduced to statistical reasoning, and how students respond to what is being presented, 
appears to be needed. 
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